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Abstract 

There is a set of rectangular macros with given dimensions, and there are wires connecting some pairs (or 
sets) of them. We have a placement area where these macros should be placed without overlaps in order to 
minimize the total length of wires. We present a heuristic algorithm which utilizes a special data structure for 
representing two dimensional stepfunctions. This results in fast integral computation and function modification 
over rectangles. Our heuristics, especially our data structure for two-dimensional functions, may be useful in 
other applications, as well. 


1 Introduction 

A chip is composed of basic elements called cells, circuits, boxes or modules. They usually have a rectangular shape, 
contain several transistors and internal connections, and have (at least two) fixed pins. There is a netlist describing 
which pin should be connected to which other pins. The goal is to place the cells legally - without overlaps - in 
the chip area so as to minimize the total (weighted) length of the wires connecting the pins. This problem is also 
called the VLSI placement problem. 

Finding the optimum is NP-hard, therefore, we present a heuristic algorithm based on primal-dual optimization 
inspired by the Hungarian Algorithm for the minimum weight maximum matching problem. Namely, we use a 
cost function as dual function on the placement area, and we are looking for a placement minimizing the sum of the 
total netlength and the total costs of the areas covered by the macros. We try to find a non-negative cost function 
by which an almost optimal placement is legal even if we allow overlaps, and costs are counted with multiplicity. 
We will use an iterated algorithm on the space of primal-dual pairs based on the following two steps: 

1. For every overlap, we increase the cost function under intersecting areas. 

2. We try to find a better placement with respect to the new cost function. 

We tried to focus on typical instances in practice, and we found that these have the following properties. 

1. The placement area is not very large compared to the total size of the macros, but it is still easy to find a 
legal placement. 

2. There are about a few hundreds of macros and every macro is contained in at most 10 nets. 

3. Most of the nets connect two, sometimes three, and rarely more than three macros to each other. 

Our method is optimized for such inputs. 

This paper is organized as follows. In Section [2 we introduce some notations and give a formal definition of 

the macro placement problem. In Section |3l we describe the basic idea behind our algorithm and in Section |4l we 

present the algorithm. In Section [SJ we describe some additional heuristics used in our placer. 


2 The macro placement problem 

Now we give a formal definition of the simplified macro placement problem. Let us denote by A4 the set of macros. 
We assume that all pins of each macro are in the center of the macro. The place of a macro is identified with the 
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place of its center pin. For a macro M, denote its horizontal and vertical size by sizex{M), sizey{M), respectively. 
For a macro M at {x, y), we denote the area occupied by M by 


I sizcxiM) sizCxiM) 

S{M, [x, y)) = [x -;;-, X + 
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A net TV is a subset of the macros that are connected. 


X y- 


,{M) 
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Definition 1. A netlist is a pair {A4,Af) where Ai is a finite set of maeros and N C 'P(AA) is a set of subsets of 

M. 


One can think of A/” as a hypergraph on M, where each TV S A/” is a hyperedge. We assume |A^| > 2 for each 
N €Af. 

Definition 2. The plaeement area is a rectangle denoted by A. This contains a set of rectangular blockages B 
(where \/B G B, B C A). The sides of all rectangles are parallel to the axis. 

A blockage is a part of the placement area where no macro can be placed. 

Definition 3. A placement is a map p : A4 A. The placement p is legal if all of the followings hold. 

• Every macro M € Ai is placed in the placement area: 

S{M,p{M)) C A 

• The places of any two macros M, M' G A4 are disjoint: 

S{M,p{M)) n S{M',p{M')) = 0. 

• None of the macros M G Ad are placed on a blockage B G B: 

S{M,p{M)) nB = 0. 

The macros have to be placed in the given orientation, these cannot be rotated. Let (Ad, A/") be a netlist 
and p a legal placement to the placement area A with blockages B. Define p on the set of nets M as follows. For 
N = {Ml, M 2 ,..., Mfe} G Af, let p{N) = (p(Mi),p(M 2 ),... ,p{Mk)). We have a function C : A^UA^UA'^Li... i-G R+ 
which evaluates the length of a net. £ is also called the net (or netlength) model. One commonly used net model 
is the bounding-box model: 

BB({xi,yi), {X 2 ,y 2 ), ...,{xk,yk)] = maxjxi} - minja:,} + maxjj/j} - minlyj (1) 

\ / i i i i 

This is the half perimeter of the smallest rectangle with sides parallel to the axis, containing all pins of the 
macros contained in the net N. 

The Simplified Placement Problem: 

Given a netlist (Ad, A/"), a placement area A, the set of blockages B and a net model £, find a legal placement 
p : Ad 1 -^ A which minimizes the total netlength: 




3 Basic tools of the placer 

The initial problem is to place macros in the placement area (avoiding the blocked areas) so that the total netlength 
is minimal (or close to the minimum). As finding the optimum is NP hard, we present a heuristic algorithm with 
0( log(n) log(m)s) running time, where the placement area is a discrete n x m grid and we run the algorithm for s 
rounds. 

We introduce our algorithm in several steps. 
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Problem 1. We have a set of macros A4 and disjoint slots A. There is a cost function c \ M. x which 

assigns costs to every possible macro-slot assignment. Find an injective assignment p : AA. ^ A with minimum 
total cost 

^ c(M, p{M)). 

MeM 

Solution. This scenario can be represented by a bipartite weighted graph. The two set of points are A4 and A and 
the cost of an edge {M,A) is c{M,A). The task is to cover A4 with a minimum cost maximum matching. This is 
a well known optimisation problem, and it is usually solved by primal-dual methods (e.g. the Hungarian Method 
i)- However, we choose the following method instead, because we will generalize this in the later steps. We try to 
find a primal-dual solution by the following market simulation: if there is an area which is the best possible choice 
for at least two macros, then we increase the cost of that area. 

Problem 2. For a macro M € Af and a given netlist A/", let E{M) be the set of nets containing M: 

E{M) = {N gAT \ M G N}, 

and let N{M) be the set of its neighbors: 

N{M) = {M' gM\3N G E{M) : M' G A}. 

Let C{N) be the netlength model. Given a set of macros Al, disjoint slots A and netlist A/", find an injective 
assignment p \ AA ^ A which minimizes 

E 

AfeAt 

Solution. We try to use the solution of Problem 1, where the cost of the placement of one macro is replaced by 
its marginal contribution to the total cost. Now, the cost of one macro depends on the placement of its neighbors, 
but there are not too many neighbors, therefore, we expect the cost function to change rather slowly. This allows 
us to use a variation of the above described market simulation: 

1. Take an arbitrary macro-area assignment p. 

2. For each macro M S Af, fix the other macros at their current position. For every A G A, we get a placement 
P{M,A) from p by changing the assignment of the macro M to A. Let us define the marginal contribution of 
M placed at A by 

Cp{M,A) = Cost{A)+ ^ C{p(m,a){N)). 

NgE{M) 

3. Use the method described in the solution of Problem 1 with cost function Cp for AA,A to get a better 
assignment p'. 

4. Continue with step [2] using the assignment p' given in [3] 

We run this procedure for a number of rounds. 

Remark 1. With no further adjustment, the algorithm can easily result in an infinite loop as the following example 
shows: 

Consider two macros and let the netlist be one single net connecting the two macros. During the run of the algorithm 
if the two macros are at different position, the macro with the higher cost would move to the area where the other 
macro is, because this decreases the total netlength and the total cost of the macros. The cost of this area increases 
until one of them moves to another place. As before the net connecting this macro to the other causes the other 
macro to move as well to the same area. Therefore the same process starts again. This shows that increasing the 
cost under the overlaps alone is not enough. 

After each round in the algorithm if there are at least two macros at the same area, we increase the cost of that 
area. At the beginning of the algorithm we allow overlaps to get a better placement, basically allowing not only 
better, but slightly worse placements to prevent the algorithm to get stuck early in some local minimum. Later, 
we increasingly punish overlaps to prevent the loop in Remark [T] This is a kind of cooling process. If we set 
the increment rate properly, the macros will have the time to distribute evenly in the placement area, with small 
netlength. Later, this punishment goes to infinity, hereby enforcing a legal solution. 
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Problem 3. There is a set of macros M, a netlist JV, a placement area A which is a discrete n x m grid and a 
netlength model C given. Each edge length of each macro is the multiple of the edge length of the grid. A placement 
p ■. M. ^ Ais legal if S{Mi,p{Mi)) n S{M 2 ,p{M 2 )) = 0, Mi ^ M 2 . During the run of the algorithm we allow 
non-legal placements. We only require that the final placement p is legal. Our task is to find a legal placement 
which minimizes 

E 

Solution. Here the places are not disjoint as in Problem 2, so during the run of the algorithm the macros can 
overlap partially as well. In this case we increase the cost under the intersection proportional to its size. 


Problem 4. In the general setting the sizes can be real numbers. 


Solution. To use the solution of Problem 3, we divide the placement area to a sufficiently fine discrete grid and 
use only natural numbers for approximation. We round up the edge length of each macro to the nearest multiple 
of the edge length of the grid. 

4 Our global placer 

The algorithm receives an initial placement (e.g. random with many overlaps) and then refines it to a global 
placement with minimized total netlength. 

4.1 The structure of the algorithm 

Our algorithm consists of|T]steps as follows: At the beginning, we generate an initial placement, or we use the given 
one. Then, for a given number of rounds, we do the following. 

1. We choose a macro M randomly with original position Xq. 

2. We generate t possible new positions Xi,... ,Xt around its original position, with move-macro(M) 

3. We move the macro M to the positon Xi that minimizes 

weight{M, Xi) + NetLength{Exi(,M)) + penalty{M, Xi), (2) 

where Exi{M) is obtained by moving the macro M to Xi. 

4. We increase the weights under the overlaps of M with every other macros. 

4.2 Notations 

To discretize the problem, we consider the placement area to be a finite n x m grid. We can assume that n = 
2P, m = 2'J. These parameters are free to choose according to the available computing resources. Denote the size of 
the placement area by and Ay (for the horizontal and vertical size). After we set n, m, we divide the placement 
area to an n x m grid. Our grid will consist of nm squares of dimensions x x y = ^ x ^. We record the cost as 
a stepfunction on the placement area which is constant on the cells of the grid. In other words we define the cost 
on the cells of the grid and not as a function on the placement area. Let Pij be the weight under the zth square 
of the jth row. Denote by A4„,m tbe set of all n x m matrices. Define the inner product of two matrices (say 
A,Bg Mn,m) as follows: 

n m 

A-k B ■.= E! 

i=i j=i 

We represent a rectangle with its top-left and bottom-right corners. For a given rectangle R = ((xi, j/i), (x 2 ,?/ 2 )), 
xi < X 2 , yi < y 2 , we consider the slightly larger rectangle 
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This is the smallest rectangle of the grid covering R. From now on let every rectangle be given in the form: 

i? = ((oix, 6 i 2 /), ( 020 :, & 22 /)) = ((ai,&i), ( 02 , & 2 ))- 
The characteristic function of i? = ((oi, &i), ( 02 , 62 )) is defined as the following n x m matrix. 


4 ( \n,m 

= l I 0,i,j — 


1 if Cl < z < 02 and bi < j < 62 
0 otherwise 


4.3 Data structure for the weights 

We introduce a data structure by which we can calculate m in log(n) log(TO)) time, and also, we can increase 
the cost function by a constant under any rectangle R (as in Problem|4]) in 0( log(n) log(TO)) time. 

Remark 2. We have two operations on P. 

• f{R, P) = An * P returns the total cost under a given rectangle R. 

• g{R, P, w) increases each entry of P by w under the rectangle R. {P := P + wAr) 

In our algorithm, we use these operations in every round, therefore, we need to compute them fast. 

We construct an orthogonal basis {Pi, R 2 , • ■ •, Pfe} in this space {k = nm). For a given rectangle P, in order to 
compute Ar * P, we only need to know the products Bi * Ar for every i = 1,... ,k. Increasing the entries under R 
by w in P = onBi can be done by increasing the coefficients of the expansion ai = ai-\- wArXB i. We construct 
a base such that for every rectangle P, there are only a few basis elements which are not orthogonal to P, and 
hence the inner product can be computed in constant time. First, consider the one dimensional array P = 
and let n = 2^. Define P^ as follows. 

r I {2k- 2)2“ <j< {2k - 1)2“ 

BUj) = I -1 (2fc - 1)2“ < J < 2fc2“ 

I 0 else 


where a = 0,... ,p, k = 1,... ,2^ “ j = 1...,n. We also consider the basis element P{ = 1. For example, the 
elements for n = 8 are 


I 


0 


0 


0 

-I 


0 


0 


0 

0 


1 


0 


0 

0 

0 

, B°2 = 

-1 

0 

tdO 

7 ^3 — 

0 

1 

dO 

, — 

0 

0 

0 


0 


-I 


0 

0 


0 


0 


I 

1- 

0 

1_ 


0 


0 


-1 


I 


0 


I 


■ 1 

I 


0 


I 


1 

-I 


0 


I 


I 

-I 

0 

, Bl = 

0 

I 

, b! = 

I 

-I 

to 

II 

1 

I 

0 


I 


-I 


1 

0 


-1 


-I 


I 

1- 

0 

1_ 


-1 


-I 
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Lemma 1. Let s,t S IN, s < t. For a given Rs.t, there are at most 21og(n) elements of the basis B^ for which 
Rs,txB% ^ 0. 

Proof: It is easy to check that Bf is an orthogonal basis in R". Let Rg^t G R" as in the Lemma: 


RsAj) 


1 S < j <t 
0 else 
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If * Rs^t ^ 0 then either ([3]) or (g]) or both holds. 


{2k - 2)2“ < s < 2fc2“ (3) 

{2k - 2)2“ < t < 2fc2“ (4) 

For a given Rs,t, the number of for which (l3]),(|4]) or both holds, is at most 21og(n). This completes the proof 
of the lemma. □ 

It is easy to compute the scalar product of Rg^t and B'^: 


Star^{Rg^t,Bl) = - min {s - {2k - 2)2“, 2fc2“ - s}, 
Stary{Rg^t,Bl) = min {t - {2k - 2)2“, 2fc2“ - t). 


Then the scalar product of and is: 


Rx,y * Bj, 


Starx{Rs,t, Bl) 

Stary{Rs^t,Bl) 

Star^{Rs^t,B^) + Stary{Rs^t, B^) 


if only holds 
if only (H]) holds 
if ([3|) and (|3]) holds 


(5) 


Now we can get a basis in from the one dimensional case as follows. Let 


Bl:Ui,j)=Bl{i)-B'l{j), 


where corresponds to the basis in IR" and B^ to the basis in R™. It is not hard to check that {B^’l*} is a basis in 
■Mn,Tn- Following the argument of Lemma[TJ for any rectangle R, there are at most 0( log(n) log(m)) basis elements 
(B^’J*) not orthogonal to A^. Furthermore, from ([5]), the scalar product ^(a;i,a; 2 )x(yi,y 2 ) * computed in 

constant time. 

It is easy to see that B^’J* satisfies: 


• VB rectangle, 
time. 


{B^’J' : Aji-kB‘l^'\ 0} = log(n) log(TO)^ . Furthermore, we can find them in log(n) log(m)^ 


VB rectangle Ar * B^’\ can be computed in constant time. 


4.4 Inflation 

During the run of the algorithm, the cost of crowded areas may get too high, causing that all macros will avoid 
that area. Rather than waiting for the costs of all the other places to increase, we implement a cost reducer. It will 
reduce the differences between the high- and low-cost areas. We chose the method below because it can be easily 
implemented without further computation time. The best rate of inflation should be adjusted. 


4.5 The increase(R,value) subroutine 

Let a = {oi)j\) be a global variable denoting the coefficients of the basis elements B),) in the expansion of B = 
Sfc i a b increase(R,value) subroutine computes the scalar product of the basis elements B),) and 4/j, 

and increases the current coefficient of B^’^^ with this product multiplied by value. We repeat this for all 


Algorithm 1 increase(B,z;a/Me) 
for {a = 0,..., log(n)} do 
for {6 = 0 ,..., log(TO)} do 
for {k, I : B')) * An ^ 0} do 

+ scalar{An, B')^'\) * value 

end for 
end for 
end for 
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In line 3, we can find the pairs {k,l) in constant time as follows. For a given R there are at most 4 pairs (fc, /) 
such that scalar {An, ^ 0. The possible pairs can be found easily from the coordinates of R. Fix a,b 
and a rectangle R = {{xi,yi), {x2,y2)), where xi < X2, yi < j/ 2 . Let ki,lj be such that 

(2fci - 2)2“ < xi < 2fci2“, ( 2*2 - 2)2“ < X 2 < 2 * 22 “, and 
(2li - 2)2“ < 2/1 < 2^12“, (212 - 2)2“ < y^ < 2^22“ 
holds. The basis elements (with fixed a,b) possibly not orthogonal to An are 

4.6 The cost(R) subroutine 

Here, Round € IN is a global variable denoting the current round of the algorithm. The cost(R) function receives a 
rectangle R and returns the total cost of the cells inside this rectangle. This routine uses the basis expansion for 
the cost matrix P in order to compute the scalar product as follows: 


Algorithm 2 cost(R) 
cost = 0; 

for {a = 0,..., log(n)} do 
for {6 = 0 ,..., log(m)} do 
for {k,l : * An 7 ^ 0} do 

cost = cost + * scalar{An, 

end for 
end for 
end for 

cost = cost + penalty {Round, R); 
return cost; 


5 Heuristics 

In this section, we discuss further parameters of the algorithm. We make suggestions for all parameters, but these 
should be experimentally adjusted. 


5.1 The move-macro(M) subroutine 


This routine returns a new possible place for M. As before, let A^, Ay denote the horizontal and vertical size of the 
placement area A. The location of a macro M is given by its placement coordinates {x,y). For a macro M G A4, 
let us denote the largest and smallest possible x coordinates for the macro M by 


Xmax{^') — Ax 


sizex{M) 

2 


^min{-Ad) 


sizex{M) 

2 


We define ymin{M),ymax{M) analogously. Let 7 ( 0 ;) = exp(log(a;) • [/[0,1]) where 17[0,1] is a uniformly dis¬ 
tributed random variable in [0,1]. This distribution is our heuristic choice. The subroutine: 


Algorithm 3 move_macro(M) 

a = Rand{-I,l} 

6 = Rand{-I,l} 

( X —7(a;-|-I) 

if a = I 

^new — j 

f 

[ X -1- ^{Xmax - X) 

if a = —1 

_J 

\ 2/- 7 ( 2 / + !) 

if 6 = 1 

Vnew — \ 

Return 

[ 22 + l{yniax - y) 
^new 7 Vnew 

if 6 = -1 
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5.2 The penalty(Step,R) function 


Algorithm 4 penalty(Step, R) 
cost = 0 

for {M G A4,M ^ R} do 

cost = cost + c * 6step * Circ(M n R) 

end for 
Return cost 


Here, Circ(i?) denotes the circumference of the rectangle i?, c is a constant and Sstep is a parameter. 

5.3 The smooth_edge(E) fnnction 

We will consider the bounding-box model only. During the earlier stages of the optimization, when we compare 
different positions of a macro and we calculate the total distance of the wires, then we should consider that the 
positions of the neighbors are still rough. Therefore, it turns out to be useful to consider the positions of the 
neighboring pins with some uncertainty, namely, as distributions around their present positions. This can be 
expressed by using a smoothed version of the absolute value function of the difference in each coordinate. This tool 
was already used in the literature, it is common to approximate the bounding box model o with strictly convex 
functions which converges to the bounding-box netlength. One of them is the log-sum-exp function (see [2], [3], 

I3])' 

LSE,,{N) := alog ^ exp {x{p)/a)'^ -b a log ^ exp ( - a;(p)/a)), 
pev pew 

and LSE{N) := LSEx{N) + LSEy{N). It is easy to see that LSE{N) — BB{N), as a —>■ 0. 

An alternative way is to approximate with Lp norms (see [S] ): 

LPRN) := ^ (^{x{p) - x{q)y + aj , 

p,qGN 

and LP{N) := LPx{N) + LPy{N). LP{N) —>■ BB{N) holds again, if ^ oo, p —>■ oo. 

We used exponential functions, in a way similar to the log-sum-exp model, as follows: 

NLRN) = 4 X! log (®^P + exp ( - ^a;(p))), 

^ pGN 

and NL{N) = NLj;{N) + NLy{N). It is clear that NL{N) —>• BB{N) holds if /3 —>■ oo. We use 

^ Max Rounds 

Max Rounds — Round + 1 ’ 

where Max Rounds is the number of rounds for which we want to run the algorithm. Formally the code of this 
subroutine is as follows: 


Algorithm 5 smooth_edge(E) 

Cx = ^ log(exp(/3a:(F;)) -b exp(^(-a:(£')))) 
Cy = ;glog(exp(/32/(F;)) -b exp(^(- 2 /(F;)))) 
Return Cx + Cy 


Notice that after many rounds, the edge length tends to the actual Bounding-box netlength. 

5.4 Possible remaining overlaps 

It is usually useful to stop the global placement before it removes all the overlaps. Our placer is ineffective in the 
very final stages of the algorithm, when the actual placement is almost legal, and only a few small overlaps should 
be eliminated. Therefore, we can get slightly better results if we stop the algorithm before the very final steps, and 
we use some other final legalization method, even a simple naive one. In our case, these final minor modifications 
were performed by hand. 









6 Conclusions 


In this paper we gave a heuristic algorithm for the NP-hard macro placement problem. The design of the algorithm 
is based on a primal-dual approach to a matching problem (see Section |3l Problem 1). 

First, we implemented a special data structure to handle the dual (cost) function efficiently during the algorithm. 
This can records a multidimensional (in our case, 2-dimensional) discrete function, and performs efficiently the 
following two operations. It returns with the sum (integral) of the values in any rectangle, and it can increase the 
function with any constant in any rectangle. This data structure can also be useful for other purposes. 

The second part includes the heuristics (see Section E]) inspired by the Hungarian Algorithm. We suggest an 
algorithm that iteratively revises the primal and the dual functions. Despite a pair of optimal primal-dual solutions 
do not exists, this causes problems only around the finalization of the placement. Our this heuristics seemed to 
perform well for finding good rough positions for the macros. Therefore, we used a natural continuous transition 
of the primal-dual method to a simple algorithm which just enforces disjointness. There were many minor details 
where we found nontrivial solutions which can be used in other problems, as well. All these together provide a 
flexible and robust algorithm for the VLSI placement problem, which can be easily optimized for different scenarios. 
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